Cost-based Dynamic Job Rescheduling: A Case Study of the Intel Distributed Computing Platform

نویسندگان

  • Linh T.X. Phan
  • Zhuoyao Zhang
  • Saumya Jain
  • Harrison Duong
  • Godfrey Tan
  • Boon Thau Loo
  • Insup Lee
چکیده

We perform a trace-driven analysis of the Intel Distributed Computing Platform (IDCP), an Internet-scale data center based distributed computing platform developed by Intel Corporation for massively parallel chip simulations within the company. IDCP has been operational for many years, and currently is deployed “live” on tens of thousands of machines that are globally distributed at various data centers. Our analysis is performed on job execution traces obtained over a one year period collected from tens of thousands of IDCP machines from 20 different pools. Our analysis demonstrates that job completion time can be severely impacted due to job suspension when higher priority jobs preempt lower priority jobs. We then develop cost-based dynamic job rescheduling strategies that adaptively restart suspended jobs, which better utilize system resources and improve completion times. Our trace-driven evaluation results show that dynamic rescheduling enables IDCP to significantly reduce job completion times.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Replication-Based Scheduling in Cloud Computing Environment

Abstract— High-performance computing and vast storage are two key factors required for executing data-intensive applications. In comparison with traditional distributed systems like data grid, cloud computing provides these factors in a more affordable, scalable and elastic platform. Furthermore, accessing data files is critical for performing such applications. Sometimes accessing data becomes...

متن کامل

Dynamic grid scheduling with job migration and rescheduling in the GridLab resource management system

Grid computing has become one of the most important research topics that appeared in the field of computing in the last years. Simultaneously, we have noticed the growing popularity of new Web-based technologies which allow us to create application-oriented Grid middleware services providing capabilities required for dynamic resource and job management, monitoring, security, etc. Consequently, ...

متن کامل

Parallel computing using MPI and OpenMP on self-configured platform, UMZHPC.

Parallel computing is a topic of interest for a broad scientific community since it facilitates many time-consuming algorithms in different application domains.In this paper, we introduce a novel platform for parallel computing by using MPI and OpenMP programming languages based on set of networked PCs. UMZHPC is a free Linux-based parallel computing infrastructure that has been developed to cr...

متن کامل

A dynamic rescheduling algorithm for resource management in large scale dependable distributed systems

Scheduling is the key to distributed applications performance in large scale heterogeneous environments. For such systems resilience in case of faults can be approached at the level of rescheduling mechanisms. The performance of rescheduling is very important in the context of large scale distributed systems and dynamic behavior. The paper proposes a generic rescheduling algorithm, which can be...

متن کامل

Smart load shedding and distributed generation resources rescheduling to improve distribution system restoration performance

After a permanent fault occurs if it is not possible to supply the load in the network, the optimal load restoration scheme allows the system to restoration the load with the lowest exit cost, the lowest load interruption, and in the shortest possible time. This article introduces a new design called Smart Load Shedding, abbreviated SLS. In the proposed SLS scheme, the types of devices in smart...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010